Semi-Supervised Active Clustering with Weak Oracles
نویسندگان
چکیده
Semi-supervised active clustering (SSAC) utilizes the knowledge of a domain expert to cluster data points by interactively making pairwise “same-cluster” queries. However, it is impractical to ask human oracles to answer every pairwise query. In this paper, we study the influence of allowing “not-sure” answers from a weak oracle and propose algorithms to efficiently handle uncertainties. Different types of model assumptions are analyzed to cover realistic scenarios of oracle abstraction. In the first model, random-weak oracle, an oracle randomly abstains with a certain probability. We also proposed two distance-weak oracle models which simulate the case of getting confused based on the distance between two points in a pairwise query. For each weak oracle model, we show that a small query complexity is adequate for the effective k means clustering with high probability. Sufficient conditions for the guarantee include a γ-margin property of the data, and an existence of a point close to each cluster center. Furthermore, we provide a sample complexity with a reduced effect of the cluster’s margin and only a logarithmic dependency on the data dimension. Our results allow significantly less number of same-cluster queries if the margin of the clusters is tight, i.e. γ ≈ 1. Experimental results on synthetic data show the effective performance of our approach in overcoming uncertainties.
منابع مشابه
Relaxed Oracles for Semi-Supervised Clustering
Pairwise “same-cluster” queries are one of the most widely used forms of supervision in semi-supervised clustering. However, it is impractical to ask human oracles to answer every query correctly. In this paper, we study the influence of allowing “not-sure” answers from a weak oracle and propose an effective algorithm to handle such uncertainties in query responses. Two realistic weak oracle mo...
متن کاملActive, semi-supervised learning to utilize human oracles
We present an approach to interactive machine learning, in which unlabeled data is employed in conjunction with active learning to better utilize the valuable resources that the human oracles provide. We empirically evaluate the approach in two very different applications, smartphone interruptibility prediction and semantic parsing. In both applications, we show that the use of active, semi-sup...
متن کاملExtracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering
Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...
متن کاملAn Improved Semi-supervised Clustering Algorithm Based on Active Learning
In order to solve the difficult questions such as in the presence of the cluster deviation and high dimensional data processing in traditional semi-supervised clustering algorithm, a semi-supervised clustering algorithm based on active learning was proposed, this algorithm can effectively solve the above two problems. Using active learning strategies in algorithm can obtain a large amount of in...
متن کاملA confidence-based active approach for semi-supervised hierarchical clustering
Semi-supervised approaches have proven to be effective in clustering tasks. They allow user input, thus improving the quality of the clustering obtained, while maintaining a controllable level of user intervention. Despite being an important class of algorithms, hierarchical clustering has been little explored in semisupervised solutions. In this report, we address the problem of semi-supervise...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1709.03202 شماره
صفحات -
تاریخ انتشار 2017